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We describe the architecture and initial implementation of the next-generation of Grid Data Management 
Middleware in the EU DataGrid (EDG) project. 

The new architecture stems from our experience together with the user requirements gathered during the two 
years of running our initial set of Grid Data Management Services. All of our new services are based on the Web 
Service technology paradigm, very much in line with the emerging Open Grid Services Architecture (OGSA). 
We have modularized our components and invested a great amount of effort in developing secure, extensible 
and robust services, starting from the design but also using a streamlined build and testing framework. 
Our service components are: Replica Location Service, Replica Metadata Service, Replica Optimization Service, 
Replica Subscription and high-level replica management. The service security infrastructure is fully GSI-enabled, 
hence compatible with the existing Globus Toolkit 2-based services; moreover, it allows for fine-grained autho- 
rization mechanisms that can be adjusted depending on the service semantics. 



1. Introduction 

The EU DataGrid project [|| (also referred to as 
EDG in this article) is now in its third and final year. 
Within the data management work package we have 
developed a second generation of data management 
services that will be deployed in EDG release 2.x. Our 
first generation replication tools (GDMP, edg-replica- 
manager etc.) provided a ve ry g ood base and input, 
which we reported on in \HL Il4j . The experience we 
gained in the first generation of tools (mainly written 
in C++), is directly used in the second generation 
of data management services that are based on web 
service technologies and mainly implemented in Java. 

The basic design concepts in the second generation 
services are as follows: 

• Modularity: 

The design needs to be modular and allow for 
easy plug- ins and future extensions. 

In addition, we should use generally agreed stan- 
dards and do not rely on vendor specific solu- 
tions. 

• Evolution: 



Since OGSA is an upcoming standard that is 
most likely to be adapted by several Grid ser- 
vices in the future, the design should allow for 
an easy adoption of the OGSA concept. It is 
also advisable to use a similar technology. 

In addition, the design should be independent of 
the underlying operating system as well as rela- 
tional database managements system that are 
used by our services. 

Having implemented the first generation tools 
mainly in C++, the technology choices for the sec- 
ond generation services presented in this article are as 
follows: 

• Java based servers are used that host web ser- 
vices (mainly Jakarta's Tomcat as well as Oracle 
9iAS for certain applications). 

• Interface definitions in WSDL 

• Client stubs for several programming languages 
(Java, C/C++) through SOAP using AXIS for 
Java and gSOAP for C++ interfaces. 



TUAT008 



2 



Computing in High Energy Physics (CHEP 2003), La Jolla, California, March 24 - 28, 2003 



• Persistent service data is stored in a relational 
database management system. We mainly use 
MySQL for general services that require open 
source technology and Oracle for more robust 
services. 

The entire set of data management services consists 
of the following parts: 

• Replication service framework: This ser- 
vice framework is the main part of our data 
management services and is described in detail 
in Section It basically consists of an over- 
all replica management system that uses several 
other services such as the Replica Location Ser- 
vice, Replica Optimization service etc. 

• SQL Database Service (Spitfire): Spitfire 
provides a means to access relational databases 
from the Grid. 

• Java Security Package: All of our services 
have very strict security requirements. The Java 
security package provides tools that can be used 
in Grid services such as our replication services. 

All these components are discussed in detail in the 
following sections and thus also outline the paper or- 
ganization. 

2. Replication Service Framework 
'Reptor' 

In the following section we first give an architectural 
overview of the entire replication framework and then 
discuss individual services (Replica Location Service, 
Replica Optimization Service etc.) in more detail. 

2.1. General Overview of Replication 
Architecture 

Figuref^presents the user's perspective of the main 
components of a replica management system for which 
we have given the code-name 'Reptor'. This design, 
which first was discussed in , represents an evolution 
of the original design presented in 0, || . Several of 
the components have already been implemented and 
tested in EDG (see shaded components) whereas oth- 
ers (in white) are still in the design phase and might 
be implemented in the future. 

Reptor has been realized as a modular system that 
provides easy plugability of third party components. 
Reptor defines the minimal interface third party com- 
ponents have to provide. According to this design the 
entire framework is provided by the Replica Man- 
agement Service which acts as a logical single entry 
point to the system and interacts with the other com- 
ponents of the systems as follows: 



• The Core module provides the main function- 
ality of replica management, namely replica cre- 
ation, deletion, and cataloging by interacting 
with third party modules such as transport and 
replica and metadata catalog services. 

• The goal of the Optimization component (im- 
plemented as a service) is to minimize file access 
times by pointing access requests to appropriate 
replicas and pro-actively replicating frequently 
used files based on gathered access statistics. 

• The Security module manages the required 
user authentication and authorization, in par- 
ticular, issues pertaining to whether a user is 
allowed to create, delete, read, and write a file. 

• Collections are defined as sets of logical file- 
names and other collections. 

• The Consistency module maintains consis- 
tency between all replicas of a given file, as well 
as between the meta information stored in the 
various catalogs. 

• The Session component provides generic check- 
pointing, restart, and rollback mechanisms to 
add fault tolerance to the system. 

• The Subscription service allows for a publish- 
subscribe model for replica creation. 

We decided to implement the Replica Management 
Service and the core module functionality on the client 
side in the Replica Manager Client, henceforth re- 
ferred to as the Replica Manager. The other subser- 
vices and APIs are modules and services in their own 
right, allowing for a multitude of deployment scenarios 
in a distributed environment. 

One advantage of such a design is that if a sub- 
service is unavailable, the Replica Manager can still 
provide all the functionality that does not make use 
of that particular service. Also, critical service com- 
ponents may have more than one instance to provide 
a higher level of availability and to avoid service bot- 
tlenecks. 

A detailed description of the implemented compo- 
nents and services can be found in the following sub- 
sections as well as in the original design in 0. 

2.2. Interaction with Services 

The Replica Manager needs to interact with many 
external services as well as internal ones, such as the 
the Information Service and transport mechanisms 
like GridFTP servers Most of the components 
required by the Replica Manager are independent ser- 
vices, hence appropriate client stubs satisfying the in- 
terface need to be provided by the service. By means 
of configuration files the actual component to be used 
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Figure 1: Reptor's main design components. 



can be specified and Java dynamic class loading fea- 
tures are exploited for making them available at exe- 
cution time. 

To date, the Replica Manager has been tested using 
the following components: 

• Replica Location Service (RLS) p|: used for lo- 
cating replicas in the Grid and assigning physi- 
cal file names. 

• Replica Metadata Catalog (RMC): used for 
querying and assigning logical file names. 

• Replica Optimization Service (ROS): used for lo- 
cating the best replica to access. 

• R-GMA: an information service provided by 
EDG: The Replica Manager uses R-GMA to ob- 
tain information about Storage and Computing 
Elements |7|. 

• Globus C based libraries as well as CoG 12] pro- 
viding GridFTP transport functionality. 

• The EDG network monitoring services: EDG 
(in particular WP7) provides these services to 
obtain statistics and network characteristics. 

The implementation is mainly done using the Java 
J2EE framework and associated web service technolo- 
gies (the Apache Tomcat servlet container, Jakarta 
Axis , etc.). In more detail, we use client/server 
architectures making SOAP Remote Procedure Call 
(RPC) over HTTPS. The basic component interaction 
is given in Figure |2 and will also explained in a few 



more details in the following sub sections. For more 
details on web service choices refer to Section l3~2l 

For the user, the main entry point to the Repli- 
cation Services is through the client interface that is 
provided via a Java API as well as a command line in- 
terface, the edg-replica-manager module. For each 
of the main components in Figure^ the Reptor frame- 
work provides the necessary interface. For instance, 
the functionality of the core module includes mainly 
the file copy and cataloging process and is handled 
in the client library with the respective calls to the 
Transport and Replica Catalog modules. 

2.3. Replica Location Service (RLS) 

The Replica Location Service (RLS) is the service 
responsible for maintaining a (possibly distributed) 
catalog of files registered in the Grid infrastructure. 
For each file there may exist several replicas. This is 
due to the need for geographically distributed copies of 
the same file, so that accesses from different points of 
the globe may be optimized (see section on the Replica 
Optimization Service). Obviously, one needs to keep 
track of the scattered replicas, so that they can be 
located and consistently updated. 

As such, the RLS is designed to store one-to- 
many relationships between (Grid Unique Identifiers 
(GUIDs) and Physical File Names (PFNS). Since 
many replicas of the same file may coexist (with dif- 
ferent PFNs) we identify them as being replicas of the 
same file by assigning to them the same unique iden- 
tifier (the GUID). 
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Figure 2: Interaction of Replica Manager with other Grid components. 



The RLS architecture encompasses two logical com- 
ponents - the LRC (Local Replica Catalog) and the 
RLI (Replica Location Index). The LRC stores the 
mappings between GUIDs and PFNs on a per-site ba- 
sis whereas the RLI stores information on where map- 
pings exist for a given GUID. In this way, it is pos- 
sible to split the search for replicas of a given file in 
two steps: in the first one the RLI is consulted in or- 
der to determine which LRCs contain mappings for a 
given GUID; in the second one, the specific LRCs are 
consulted in order to find the PFNs one is interested 
in. 

It is however worth mentioning that the LRC is im- 
plemented to work in standalone mode, meaning that 
it can act as a full RLS on its own if such a deploy- 
ment architecture is necessary. When working in con- 
junction with one (or several) RLIs, the LRC provides 
periodic updates of the GUIDs it holds mappings for. 
These updates consist of bloom filter objects, which 
are a very compact form of representing a set, in order 
to support membership queries [? ]. 

The RLS currently has two possible database back- 
end deployment possibilities: MySQL and Oracle9i. 

2.4. Replica Metadata Catalog Service 
(RMC) 

Despite the fact that the RLS already provides 
the necessary functionality for application clients, the 
GUID unique identifiers are difficult to read and re- 
member. The Replica Metadata Catalog (RMC) can 
be considered as another layer of indirection on top 
of the RLS that provides mappings between Logical 
File Names (LFNs) and GUIDs. The LFNs are user 
defined aliases for GUIDs - many LFNs may exist for 



one GUID. 

Furthermore, the RMC is also capable of holding 
metadata about the original physical file represented 
by the GUID (e.g. size, date of creation, owner). It is 
also possible for the user to define specific metadata 
and attach it to a GUID or to an LFN. The purpose of 
this mechanism is to provide to users and applications 
a way of querying the file catalog based on a wide 
range of attributes. The possibility of gathering LFNs 
as collections and manipulating these collections as 
a whole has already been envisaged, but is not yet 
implemented. 

As for the RLS, the RMC supports MySQL and 
Oracle9i as database backends. 



2.5. Replica Optimization Service (ROS) 

The goal of the optimization service is to select the 
best replica with respect to network and storage ac- 
cess latencies. It is implemented as a light-weight web 
service that gathers information from the EDG net- 
work monitoring service and the EDG storage element 
service about the respective data access latencies. 

In we defined the APIs getNetworkCosts and 
getSECosts for interactions of the Replica Manager 
with the Network Monitoring and the Storage Ele- 
ment Monitor. These two components monitor the 
network traffic and the access traffic to the storage 
device respectively and calculate the expected trans- 
fer time of a given file with a specific size. 

In the EU DataGrid Project, Grid resources are 
managed by the meta scheduler of WPI, the Resource 
Broker One of the goals of the Resource Broker 
is to decide on which Computing Element the jobs 
should be run such that the throughput of all jobs 
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is maximized. Assuming highly data intensive jobs, 
a typical optimization strategy could be to select the 
least loaded resource with the maximum amount of lo- 
cally available data. In Q we introduced the Replica 
Manager API getAccessCost that returns the access 
costs of a specific job for each candidate Computing 
Element. The Resource Broker can then take this in- 
formation provided by the Replica Manager to sched- 
ule each job to its optimal resources. 

The interaction of the Replica Manager with the 
Resource Broker, the Network Monitor and the Stor- 
age Element Monitor is depicted in Figure |3 

2.6. Replica Subscription Service 

The Replica Subscription Service (RSS) provides 
automatic replication based on a subscription model. 
The basic design is based on our first generation 
replication tool GDMP (Grid Data Mirroring Pack- 
age) |3 . 

3. SQL Database Service: Spitfire 

Spitfire |3 provides a means to access relational 
databases from the Grid. This service has been pro- 
vided by our work package for some time and was 
our first service that used the web service paradigm. 
Thus, we give more details about its implementation 
in Section 13.21 since many of the technology choices 
for the replication services explained in the previous 
section are based on choices also made for Spitfire. 

3.1. Spitfire Overview 

The SQL Database service (named Spitfire) permits 
convenient and secure storage, retrieval and query- 
ing of data held in any local or remote RDBMS. The 
service is optimized for metadata storage. The pri- 
mary SQL Database service has been re-architected 
into a standard web service. This provides a plat- 
form and language independent way of accessing the 
information held by the service. The service exposes a 
standard interface in WSDL format, from which client 
stubs can be built in most common programming lan- 
guages, allowing a user application to invoke the re- 
mote service directly. The interface provides the com- 
mon SQL operations to work with the data. Pre-built 
client stubs exist for the Java, C and CH — h program- 
ming languages. The service itself has been tested 
with the MySQL and Oracle databases. 

The earlier SQL Database service was primarily ac- 
cessed via a web browser (or command line) using 
pre-defined server-side templates. This functionality, 
while less flexible than the full web services interface, 
was found to be very useful for web portals, providing 
a standardized view of the data. It has therefore been 



retained and re-factored into a separate SQL Database 
browser module. 

3.2. Component Description and Details 
about Web Service Design 

There are three main components to the SQL 
Database service: the primary server component, the 
client (s) component, and the browser component. Ap- 
plications that have been linked to the SQL Database 
client library communicate to a remote instance of 
the server. This server is put in front of a RDBMS 
(e.g. MySQL), and securely mediates all Grid access 
to that database. The browser is a standalone web 
portal that is also placed in front of a RDBMS. 

The server is a fully compliant web service imple- 
mented in Java. It runs on Apache Axis inside a 
Java servlet engine (currently we use the Java refer- 
ence servlet engine, Tomcat, from the Apache Jakarta 
project). The service mediates the access to a RDBMS 
that must be installed independently from the service. 
The service is reasonably non-intrusive, and can be 
installed in front of a pre-existing RDBMS. The lo- 
cal database administrator retains full control of the 
database back-end, with only limited administration 
rights being exposed to properly authorized grid users. 

The web services client, at its most basic, consists 
of a WSDL service description that describes fully the 
interface. Using this WSDL description, client stubs 
can be generated automatically in the programming 
language of choice. We provide pre-built client stubs 
for the Java, C and C++ programming languages. 
These are packaged as Java JAR files and static li- 
braries for Java and C/C++ respectively. 

The browser component is a server side component 
that provides web-based access to the RDBMS. It pro- 
vides the functionality of the previous version of the 
SQL Database service. This service does not depend 
on the other components and can be used from any 
web browser. The browser component is implemented 
as a Java servlet. In the case where it is installed to- 
gether with the primary service, it is envisaged that 
both services will be installed inside the same servlet 
engine. 

The design of the primary service is similar to that 
of the prototype Remote Procedure Call GridDataSer- 
vice standard discussed in [T^, and indeed, influenced 
the design of the standard. It is expected that the SQL 
Database service will eventually evolve into a proto- 
type implementation of the RPC part of this GGF 
standard. However, to maximise the usability and 
portability of the service, we chose to implement it as 
a plain web service, rather than just an OGSA service. 
The architecture of the service has been designed so 
that it will be trivial to implement the OGSA specifi- 
cation at a later date. 

The communication between the client and server 
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components is over the HTTP(S) protocol. This max- 
imises the portability of the service, since this proto- 
col has many pre-existing applications that have been 
heavily tested and are now very robust. The data 
format is XML, with the request being wrapped us- 
ing standard SOAP Remote Procedure Call. The in- 
terface is designed around the SQL query language. 
The communication between the user's web browser 
and the SQL Database Browser service is also over 
HTTP(S). 

The server and browser components (and parts of 
the Java client stub) make use of the common Java 
Security module as described in Section^] The secure 
connection is made over HTTPS (HTTP with SSL or 
TLS). 

Both the server and browser have a service certifi- 
cate (they can optionally make use of the system's 
host certificate), signed by an appropriate CA, which 
they can use to authenticate themselves to the client. 
The client uses their GSI proxy to authenticate them- 
selves to the service. The user of the browser service 
should load their GSI certificate into the web browser, 
which will then use this to authenticate the user to the 
browser. 

A basic authorisation scheme is defined by default 
for the SQL Database service, providing administra- 
tive and standard user functionality. The authorisa- 
tion is performed using the subject name of the user's 
certificate (or a regular expression matching it). The 
service administrator can define a more complex au- 
thorisation scheme if necessary, as described in the 
security module documentation. 



4. Security 

The EDG Java security package covers two main 
security areas, authentication authorization. Authen- 
tication assures that the entity (user, service or server) 
at the other end of the connection is who it claims to 
be. Authorization decides what the entity is allowed 
to do. 

The aim in the security package is always to make 
the software as flexible as possible and to take into 
account the needs of both EDG and industry to make 
the software usable everywhere. To this end there 
has been some research into similarities and possibili- 
ties for cooperation with for example Liberty Alliance, 
which is a consortium developing standards and solu- 
tions for federated identity for web based authentica- 
tion, authorization and payment. 

4.1. Authentication 

The authentication mechanism is an extension of 
the normal Java SSL authentication mechanism. The 
mutual authentication in SSL happens by exchanging 



public certificates that are signed by trusted certificate 
authorities (CA). The user and the server prove that 
they are the owners of the certificate by proving in 
cryptographic means that they have the private key 
that matches with the certificate. 

In Grids the authentication is done using GSI proxy 
certificates that are derived from the user certificate. 
This proxy certificate comes close to fulfilling the 
PKIX [lfj requirement for valid certificate chain, but 
does not fully follow the standard. This causes the 
SSL handshake to fail in the conforming mechanisms. 
For the GSI proxy authentication to work the SSL 
implementation has to be nonstandard or needs to be 
changed to accept them. 

The EDG Java security package extends the Java 
SSL package. It 

• accepts the GSI proxies as the authentication 
method 

• supports GSI proxy loading with periodical 
reloading 

• supports OpenSSL certificate-private key pair 
loading 

• supports CRLs with periodical reloading 

• integrates with Tomcat 

• integrates with Jakarta Axis SOAP framework 

The GSI proxy support is done by finding the user 
certificate and making special allowances and restric- 
tions to the following proxy certificates. The al- 
lowance is that the proxy certificate does not have 
to be signed by a CA. The restriction is that the dis- 
tinguished name (DN) of the proxy certificate has to 
start with the DN of the user certificate (e.g. 'C=CH, 
0=cern, CN=John Doe'). This way the user cannot 
pretend to be someone else by making a proxy with 
DN 'C=CH, 0=cern, CN^Jane Doe'. The proxies 
are short lived, so the program using the SSL connec- 
tion may be running while the proxy is updated. For 
this reason the user credentials (for example the proxy 
certificate) can be made to be reloaded periodically. 

OpenSSL saves the user credentials using two files, 
one for the user certificate and the other for the pri- 
vate key. With the EDG Java security package these 
credentials can be loaded easily. 

The CAs periodically release lists of revoked cer- 
tificates in a certificate revocation list (CRL). The 
EDG Java security package supports this CRL mech- 
anism and even if the program using the package is 
running, these lists can be periodically and automati- 
cally reloaded into the program by setting the reload 
interval. 

The integration to Jakarta Tomcat (a Java web 
server and servlet container) is done with an interface 
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class and to use it only the Jakarta Tomcat configu- 
ration file has to be set up accordingly. 

The Jakarta Axis SOAP framework provides an 
easy way to change the underlying SSL socket imple- 
mentation on the client side. Only a simple interface 
class was needed and to turn it on a system variable 
has to be set while calling the Java program. In the 
server side the integration was even simpler as Axis 
runs on top of Tomcat and Tomcat can be set up as 
above. 

Due to issues of performance, many of the services 
described in this document have equivalent clients 
written in C++. To this end, there are several C++ 
SOAP clients that have been written based on the 
gSOAP library. In order to provide the same authen- 
tication and authorization functionality as in the cor- 
responding Java SOAP clients, an accompanying C 
library is being developed for gSOAP. When ready, it 
is to provide support for mutual authentication be- 
tween SOAP clients and SOAP servers, support for 
the coarse-grained authorization as implemented in 
the server end by the Authorization Manager (de- 
scribed below) and verification of both standard X509 
and GSI style server and server proxy certificates. 

4.2. Coarse grained authorization 

The EDG Java security package only implements 
the coarse grained authorization. The coarse grained 
authorization decision is made in the server before the 
actual call to the service and can make decisions such 
as 'what kind of access does this user have to that 
database table' or 'what kind of access does this user 
have to the file system'. The fine grained authoriza- 
tion that answers the question 'what kind of access 
does this user have to this file' can only be handled 
inside the service, because the actual file to access is 
only known during the execution of the service. The 
authorization mechanism is positioned in the server 
before the service. 

In the EDG Java security package the authorization 
is implemented as role based authorization. Currently 
the authorization is done in the server end and the 
server authorizes the user, but there are plans to do 
mutual authorization where also the client end checks 
that the server end is authorized to perform the ser- 
vice or to save the data. The mutual authorization 
is especially important in the medical field where the 
medical data can only be stored in trusted servers. 

The role based authorization happens in two stages, 
first the system checks that the user can play the role 
he requested (or if there is a default role defined for 
him). The role the user is authorized to play is then 
mapped to a service specific attribute. The role defi- 
nitions can be the same in all the services in the (vir- 
tual) organization, but the mapping from the role to 
the attribute is service specific. The service specific 



attribute can be for example a user id for file system 
access of database connection id with prcconfigured 
access rights. If either step fails, the user is not autho- 
rized to access the service using the role he requested. 

There are two modules to interface to the informa- 
tion flow between the client and the service; one for 
normal HTTP web traffic and the other for SOAP web 
services. The authorization mechanism can attach to 
other information flows by writing a simple interface 
module for them. 

In a similar fashion the authorization information 
that is used to make the authorization decisions can 
be stored in several ways. For simple and small instal- 
lation and for testing purposes the information can be 
a simple XML file. For larger installations the infor- 
mation can be stored into a database and when using 
the Globus tools to distribute the authorization infor- 
mation, the data is stored in a text file that is called 
the gridmap file. For each of these stores there is 
a module to handle the specifics of that store and to 
add a new way to store the authorization information. 
Only a interface module needs to be written. When 
the virtual organization membership service (VOMS) 
is used the information provided by the VOMS server 
can be used for the authorization decisions and all the 
information from the VOMS is parsed and forwarded 
to the service. 



4.3. Administration web interface 

The authorization information usually ends up be- 
ing rather complex, and maintaining that manually 
would be difficult, so a web based administration in- 
terface was created. This helps to understand the au- 
thorization configuration, eases the remote manage- 
ment and by making management easier improves the 
security. 



5. Conclusions 

The second generation of our data management ser- 
vices has been designed and implemented based on 
the web service paradigm. In this way, we have a 
flexible and extensible service framework and are thus 
prepared to follow the general trend of the upcoming 
OGSA standard that is based on web service tech- 
nology. Since interoperability of services seems to be 
a key feature in the upcoming years, we believe that 
our approach used in the second generation of data 
management is compatible with the need for service 
interoperability in a rapidly changing Grid environ- 
ment. 

Our design choices have been as follows: we aim for 
supporting robust, highly available commercial prod- 
ucts (like Oracle/DB and Oracle/ Application Server) 
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as well as standard open source technology (MySQL, 
Tomcat, etc.). 

The first experience in using the new generation 
of services shows that basic performance expectations 
are met. During this year, the services will be de- 
ployed on the EDG testbed (and possibly others): this 
will show the strength and the weaknesses of the ser- 
vices. 
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